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Abstract: The Dantzig selector (Candes and Tao, 2007) is a popular ^^-regularization 
method for variable selection and estimation in linear regression. We present a very 
^ . weak geometric condition on the observed predictors which is related to parallelism 

►^ ' and, when satisfied, ensures the uniqueness of Dantzig selector estimators. The condition 

\(^ . holds with probability 1, if the predictors are drawn from a continuous distribution. We 

t*^ ' discuss the necessity of this condition for uniqueness and also provide a closely related 

in , condition which ensures uniqueness of lasso estimators (Tibshirani, 1996). Large sample 

asymptotics for the Dantzig selector, i.e. almost sure convergence and the asymptotic 
CO . distribution, follow directly from our uniqueness results and a continuity argument. 

^^ ' The limiting distribution of the Dantzig selector is generally non-normal. Though our 

asymptotic results require that the number of predictors is fixed (similar to (Knight and 
Fu, 2000)), our uniqueness results are valid for an arbitrary number of predictors and 
observations. 

/\ . AMS 2000 subject classifications: Primary 62J05; secondary 62E20. 

^ I Keywords and phrases: Lasso, Regularized regression. Variable selection and estima- 

" " " tion. 

1. Introduction 

Regularized regression methods for variable selection and estimation have become an impor- 
tant tool for statisticians and have been the subject of intense statistical research during the 
past fifteen years (Bickel and Li, 2006; Fan and Lv, 2010; Tibshirani, 2011). These methods 
provide a tractable approach to the analysis of high-dimensional datasets and are especially 
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useful when the underlying signal is sparse. In this paper, we address some gaps in the litera- 
ture, which pertain to uniqueness and large sample asymptotic theory for the Dantzig selector 
(Candes and Tao, 2007), a popular ^^-regularized regression method that is closely related to 
lasso (Tibshirani, 1996). 

First, we develop an intuitive geometric condition related to parallelism which ensures 
that the Dantzig selector has a unique solution and demonstrate that this condition holds 
in an overwhelming majority of instances (with probability 1, if the predictors follow an 
absolutely continuous distribution with respect to Lebesgue measure). We also give a related 
necessary condition for the uniqueness of Dantzig selector solutions. These results originally 
appeared in the first author's PhD thesis (Dicker, 2010) and, to our knowledge, are the 
first uniqueness results about the Dantzig selector to be found in the literature. In fact, our 
uniqueness condition for the Dantzig selector is easily translated into a similar prevalent 
condition which implies that lasso has a unique solution. 

Aside from their independent interest, the uniqueness results presented here pave the way 
for a simple derivation of the almost sure limit and the asymptotic distribution of Dantzig 
selector estimators, when the number of predictors, p, is fixed (on the other hand, we em- 
phasize that our uniqueness results are valid for arbitrary p). These asymptotic results are 
analogous to those found in (Knight and Fu, 2000) for the lasso and further highlight sim- 
ilarities between the two methods, which have been discussed by multiple authors (James 
et al., 2009; Meinshausen et al., 2007). In fact, in comparison with Knight and Fu's [2000] 
results, uniqueness appears to be the major hurdle to obtaining large sample asymptotics for 
the Dantzig selector. The Dantzig selector is a convex - but not strictly convex - optimization 
problem. Thus, unique solutions are not guaranteed in general. However, once uniqueness is 
understood, asymptotic results for the Dantzig selector follow directly from continuity argu- 
ments. More specifically, we show that under the given uniqueness conditions the Dantzig 
selector may be viewed as a well-defined continuous mapping; asymptotic results then follow 
from the continuous mapping theorem. By contrast, for the lasso, uniqueness is assured in 
classical fixed p asymptotic analyses because the associated optimization problem is strictly 
convex (provided the predictors are non-degenerate). The foregoing discussion highlights the 
potential usefulness of uniqueness results for the Dantzig selector. More broadly, understand- 
ing uniqueness makes certain powerful tools - like the continuous mapping theorem - readily 
available for further analysis of the Dantzig selector. 

Though much of the recent interest in regularized regression methods is spurred by appli- 
cations that may perhaps be best approximated by an asymptotic regime where p — )■ oo, we 
believe that it remains important to understand classical large sample asymptotics, where p 
is fixed and n — )■ cxd, in order to obtain a more complete understanding of these procedures. 
This paper helps shed light on this issue. Moreover, we believe that our uniqueness results, 
which are valid for all p, may be useful for formulating and deriving asymptotic results for 
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regularized regression methods in settings where p — ?■ oo; however, this is a topic for future 
research and is beyond the scope of this paper (though it is briefly addressed again in our 
concluding Section 5). 

The rest of this paper proceeds as follows. In Section 2 we introduce notation and def- 
initions. In Section 3 we discuss uniqueness. Propositions 1 and 2 are the main results in 
Section 3 and summarize important uniqueness properties of the Dantzig selector and lasso 
vis-a-vis parallelism. In Section 4, we show that the Dantzig selector may be viewed as a 
continuous mapping from the space of predictors and associated outcomes to the space of 
parameter estimates (Proposition 3). Corollaries 1 and 2 give the almost-sure limit of Dantzig 
selector estimators and their asymptotic distribution, respectively. Section 5 contains a brief 
concluding discussion. Proofs may be found in the Appendix at the end of the paper. 

2. Notation and definitions 

Consider the linear model 



1/i = Xj /3* + ej, i = l,...,n (1) 

where |/i,...,|/„ G M and Xi,...,x„ G W are observed outcomes and predictors, respectively, 
ei,...,e„ are unobserved iid integrable random variables with mean E{ei) = 0, and f3* = 
(/?*, ...,/3*)"^ G W is an unknown parameter to be estimated. To simplify notation, let y = 
(l/i, ...,yn)^ G M" denote the n-dimensional vector of outcomes and X = (xi, ...,x„)^ denote 
the n X p matrix of predictors. Also let e = (ei, ..., e„)^ G M". Then (1) may be re-expressed 
as 

y = X^* + e. 

It will be useful to have a concise method for referring to sub-vectors and sub-matrices 
of various vectors and matrices. For a vector /3 = (/3i, ...,/3p)^ G MP and a subset A C 
{l,...,p}, let /3^ = {(3j)j^A G M'^I. Furthermore, for n x p matrices X = {xij)i<i<n, i<j<p 
let Xa = {xij)i<i<n, jeA denote the n x |A| matrix obtained from X by extracting columns 
corresponding to elements of A. li C = {cij)i<ij<p is a p x p matrix, and B C {l,...,p} 
has cardinality \B\, let Ca,b = {cij)ieA, jeB denote the \A\ x \B\ matrix obtained from C by 
extracting rows corresponding to elements of A and columns corresponding to elements of B. 
For j G {1, ■■■,p}, let Xj = X^j-^ denote the j-th column of X. Finally, let null(C) denote the 
null-space of the matrix C and let dim(l^) denote the dimension of the vector space V. 

The main object of study in this paper is the Dantzig selector - a linear programming 
problem for obtaining estimates of f3*, which is defined as follows: 



minimize ||/3||i 

subject to i||X^(y-X/3)||oo<A, 



(2) 
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where A = A„ > is a tuning parameter, ||/3||i = Yl^=i l/^il denotes the ^-'^-norm and ||X^(y — 

-^/3)||oo = niaxi<j<p |Xj'(y — Xf3)\ denotes the £°°-norm. Solutions to (2), denoted /3 , will 
be referred to as Dantzig selector estimators. 

We also introduce the lasso optimization problem and estimator at this time: 

^'"^^°Gargmin;l||y-X/3|p + A||/3||i, (3) 

/3g]RP zn 

where ||y — X/3|p = ^"=i(|/i — xf/3)^ is the squared £^-norm. Though the lasso is not our 
primary concern in this paper, we will sometimes find it instructive to compare aspects of the 
Dantzig selector and lasso side-by-side. For instance, as discussed in the Introduction, notice 
that if X has rank p, then lasso is a strictly convex optimization problem, which ensures that 

^ lasso 

(3 is unique. On the other hand, the Dantzig selector (2) is a linear programming problem 
and uniqueness properties are less clear, even when X has rank p. 

In order to provide some additional context for the present study, we point out that one 
of the key features of both the Dantzig selector and lasso is that they perform simultaneous 
variable selection and estimation. By this we mean that {j; M'^ = 0} and {j; /3j"''*° = 0} are 
often non-empty (contrast this with the ordinary least squares estimator for /?*). This implies 

that f3 and /3 often have reduced dimension (i.e., only a few non-zero entries) and can 
greatly enhance interpretability, along with estimation accuracy (Bickel et al., 2009; Candes 
and Tao, 2007; Tibshirani, 1996). 

3. Parallelism and uniqueness 

Parallelism plays a large role in the discussion of uniqueness of Dantzig selector solutions. 
Roughly speaking, the Dantzig selector has a unique solution if the feasible set, 

F = {(3; ||X^(y-X/3)|U<A}CM^ 

is not parallel to the £^-ball. Below, we describe parallelism as a geometric concept which is 
relevant to the Dantzig selector and then give a more formal definition. 

First note that the feasible set F is polyhedral (it is the intersection of finitely many 
hyperplanes) . Solutions of the Dantzig selector are points /3 G -F of minimal £^-norm. Let 
i?i = {u G MP; ||u||i < 1} be the closed unit £^-ball centered at the origin. Geometrically, we 
can find solutions to the Dantzig selector by "growing" tBi = {u G MP; ||u||i < t}, t > 0, 
until it intersects F; the points of intersection are Dantzig selector solutions. More precisely, 

^ ds 

let to = 11/3 111- The collection of all Dantzig selector solutions is F fl tg-Bi. When p = 2, 
the 1-dimensional faces oitBi have slope 1 or —1; the Dantzig selector has multiple solutions 
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Fig 1. An instance of the Dantzig selector with multiple solutions. The region F is the feasible set for the 
Dantzig selector and \\j3\\iBi = {u G M^; HuHi < ||/5||i}- The hold line represents the intersection of \\j3\\iBi 
with F and is the solution set for this instance of the Dantzig selector. 
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only if a 1-dimensional face of F has slope 1 or -1, that is, only if F is parallel to the £^-ball, 

As indicated by the situation when p = 2, ii the Dantzig selector has multiple solutions, 
then F is parallel to Bi (Figure 1). When p >2, the notion of parallelism which is correct for 
our purposes is less straightforward. Geometric intuition suggests that parallelism is invariant 
under translation and scalar multiplication, in the sense that F is parallel to Bi if and only 
if aF + vq = {af3 + vq; /3 G F} is parallel to _Bi for a G M \ {0} and vq G W. In particular, 
multiplying X by a (non-zero) scalar and adding vectors yo G M" to y does not affect paral- 
lelism. This leads to a definition of parallelism between F and Bi which depends only on the 
matrix n'^X'^X. In fact, in our view, the primitive concept is parallelism between a p x p 
symmetric matrix C and the £^-ball. 

Definition 1. 

(a) Let C be a p X p symmetric matrix. The matrix C is parallel to the i^-ball if and only 
if the condition [Par] (found below) holds. 

[Par] There exist subsets A, B C {1, ...,p} and a vectors/ G M'"^' such that \\Cb^w\\oo < 
I, Ca,b^ G {±1}I^I, and dim [null {Cb,a)] > 0. 

(b) The feasible set for the Dantzig selector, F, is parallel to the i^-ball if and only if 
n~^X'^X is parallel to the i^-ball. 

Remarks (i) Parallelism, as defined here, is related to degenerate sub-matrices of C, which, 
in the context of the Dantzig selector, correspond to the nontrivial faces of F . In [Par], the 
requirement that Ca,bw G {±1}'"^' is related to the fact that the faces of the £^-ball, -Bi, have 
normal vectors u G W^ where ua G {±1}'"^' for some A C {1, ...,p}. 

(ii) When p = 2, it is easy to see that F is parallel to the £^-ball if and only if one of the 
columns of n'^X'^X is a scalar multiple of some point in {±1}^. This occurs if and only if a 
one-dimensional face of F has slope 1 or -1, as depicted in Figure 1. 

As discussed above, parallelism is invariant under translation and scalar multiplication. 
On the other hand, translation and scalar multiplication of the feasible set F gives rise to 
various instances of the Dantzig selector, some with a unique solution and some, perhaps, 
with multiple solutions. This suggests that any sufficient condition for the existence of multiple 
Dantzig selector solutions must, unlike parallelism, involve y and A. To illustrate this concept, 
suppose that n~^X^X is invertible and is parallel to the £^-ball. Figure 2 (a) depicts Fq = 
{n~^X'^X)~^Boo = {{n~^X^X)^^u; u G M^, ||u||oo < 1}, which is equal to the feasible set for 
the Dantzig selector when A = 1 and y = and is parallel to the ^^-ball. Figures 2 (b) and (c) 
depict Fi and F2, potential feasible sets for the Dantzig selector that are both obtained from 
Fq by scalar multiplication and translation. The feasible sets Fi and F2 are both parallel to 
the £^-ball, and correspond to feasible sets for the Dantzig selector with the predictor matrix 
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Fig 2. (a) Fq = {X'X)~^Bac is parallel to the £^-ball, as evidenced by the bold face D. (b) Fi is obtained 
from Fq by scalar multiplication and translation; the Dantzig selector problem with feasible set Fi has multiple 
solutions, indicated by the bold line segment labeled j3. (c) F2 is obtained from Fq by scalar multiplication and 
translation; the point labeled (3 is the unique solution to the Dantzig selector problem with feasible set F2. 
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X and different values for y, A (not given liere). Tlie instance of tlie Dantzig selector with 
feasible set Fi has multiple solutions, while the Dantzig selector with feasible set F2 has a 
unique solution. 

The following condition combines parallelism with additional constraints and is a sufficient 
condition for the existence of multiple Dantzig selector solutions. 

[Mult] There exist subsets A, B C {1, ...,p} and vectors fjP E W^' , f3 E F, such that 

1. ||n-iX^XBM°l|oo < 1, n'^XjXBM° e {±1}'^', and dim [null (n-iX|X^)] > 0. 

2. u-^/B^X^XbH^ > II/3°I|i for all f3 E F. 

3. A = {j; /30^0}. 

4. n-i|Xj(y - X/3°)| = A for all j E B and n'^\X.J{y - X/3°)| < A for all j ^ B. 

Note that Condition 1 in [Mult] implies that F is parallel to the ^^-ball. Conditions 2-4 in 
[Mult] constrain the location of F in R^ relative to the origin. Proposition 1 below characterizes 
uniqueness properties of the Dantzig selector in terms of [Par] and [Mult] . A related necessary 
condition for the existence of multiple lasso solutions is given in Proposition 1 (c). Proposition 
1 is proved in the Appendix at the end of this paper. 

Proposition 1. 

(a) // [Mult] holds, then the Dantzig selector has multiple solutions. 

(b) If F is not parallel to the £^-ball, then the Dantzig selector has a unique solution. 

(c) Suppose that A > and that the lasso has multiple solutions (i.e. argmin^g]gp(2n)~^||y — 
X/3|p + A||/3||i contains more than a single element). Then there exists a subset A C 
{l,...,p} and a vector w E W such that \\n-^X'^Xw\\oo < I, n-^XjXw E {il}'^', 
and dim [null {h-^X^Xa)] > 0. 

Remarks (i) Proposition 1 is valid for any n and p. 

(a) Proposition 1 (c) may be rephrased as follows. If the lasso has multiple solutions, then 

n~^X-^X is parallel to the £^-ball and, moreover, one may take B = {1, ...,p} in the definition 

of parallelism. 

(Hi) If A = 0, then the lasso has multiple solutions whenever n~^X'^X is singular. 

(iv) The condition in Proposition 1 (c) implies that n~^X^X is parallel to the £^-ball. It 

follows that if n~^X'^X is not parallel to the £^-ball, then both the Dantzig selector and 

lasso have unique solutions. The relationship between uniqueness for the Dantzig selector 

and uniqueness for lasso is discussed by Meinshausen et al. (2007), who give a concrete 

p = 3-dimensional example (with pictures) where lasso has a unique solution, but the Dantzig 

selector does not. 
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(v) A condition similar to [Mult] which ensures the existence of multiple lasso solutions may 
be developed. This is not pursued further here . 

The next proposition suggests that the Dantzig selector and lasso have a unique solution 
in an overwhelming majority of instances. 

Proposition 2. Suppose that^i^ ...,x„ are iid and drawn from a continuous distribution with 
respect to Lehesgue measure on W . Then n~^X'^X is parallel to the i^-ball with probability 0. 
Consequently, the Dantzig selector and lasso have a unique solution with probability 1. 

Remarks (i) Proposition 2 is proved in the Appendix (a proof also appears in (Dicker, 
2010)). To provide some intuition, note that the parallelism condition requires ^"^XJXb to 
both (i) contain a specific point in its range (that is, an element of {±1}'^') and (ii) to have 
a degenerate range (in the sense that dim [null(n~^X^X^)] > 0). Proposition 2 implies that 
this occurs with probability 0, under the specified conditions. 

4. Large sample asymptotics for the Dantzig selector 

Throughout the rest of this article, assume that p and (3* G W are fixed. In this section, we 
formulate the Dantzig selector as a well-defined mapping from sample covariance matrices, 

n~^X'^X, marginal covariances, n^^X'^y, and tuning parameters, A > 0, to estimators, /3 . 
To do this, we restrict our attention to symmetric matrices that are not parallel to the £^-ball 
- Proposition 2 suggests that this restriction is fairly weak. Then, we show that the Dantzig 
selector mapping is continuous. With this machinery in place, large sample asymptotics for 
the Dantzig selector follow easily. 

Let ,^0 denote the collection of p x p positive semidefinite matrices that are not parallel 
to the £^-ball and let ^q = ^q fl GL{p), where GL{p) is the collection of all invertible p x p 
matrices with real entries. Define the Dantzig selector mapping G : 0^^ x R^ x M-" — > R^ by 
G(C, V, A) = u, where u solves the optimization problem 

minimize l|u||i ,,, 

(4) 
subject to ||Cu — v||oo < A. 

It follows directly from Proposition 1 (b) that G is well-defined. Furthermore, notice that 

G(n~^X-^X, n~^X^y, A) = /3 . Note that the domain of G may be extended to a subset 
of 0^Q xW X M-", provided one imposes conditions to ensure that the feasible set in the 
optimization problem (4) is non-empty. More specifically, define ^ = {(C, v); G G ^o, v G 
range(C)}. Then (4) defines G(C,v,A) for (C,v,A) G ^ x R^o. 

Proposition 3. The mapping G is continuous on ^q x R^ x R-°. 
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Remarks (i) A proof of Proposition 3 is found in the Appendix. A similar proof shows that G is 
also continuous on ^ x M^". In other words, assuming that the appropriate (anti-) parallelism 
conditions hold, if there is non-trivial regularization in the limit (i.e. A„ — )■ Aq > 0), then the 
Dantzig selector is continuous, regardless of whether or not the predictors and the limiting 
sample covariance matrix are singular. 

Corollary 1. Suppose that n'^X'^X -^ C E 3^^ and that A„ -> Aq > 0. Then /3 -^ f3^ , 
almost surely, where f3^ solves 

minimize ||/3||i 

subject to ||C(/3-/3*)||oo<Ao. 

Remarks (i) The corollary follows directly from Proposition 3, which implies that (3 = 

G{n-^X^X,n~^X^y,\n) -^ G{C,Cf3*,\o) = /3°, almost surely 

(a) Corollary 1 implies that under the given conditions, the Dantzig selector is consistent for 

^ ds 

(3 if and only if A„ — ?■ 0. Furthermore, it gives the almost sure limit of (3 in cases where the 
Dantzig selector is not consistent (that is, when A„ — )■ Aq > 0). 

Corollary 2. Suppose that E{ef) = a'^ < oo. Also assume that n^^X'^X -^ C E ^^ , 
that lim^^oo'^^"'^ iiiaxi<j<„ ||xj|p = 0, and that \/n\n — ?■ Aq- Let A* = {j; (5* ^ 0} and let 

A* denote the complement of A* in {!,... ,p}. Then \/n(/3 — /3*) -^ u", where — )■ denotes 
convergence in distribution, u° solves the optimization problem 

minimize ||u^. ||i + sign(/3*)^*UA* ,^. 

subject to ||Cu — v°||oo < Ao, 

andw"^ ^N{<d,a^G). 

Corollary 2 is proved in the Appendix. 
Remarks (i) The second moment condition on e^ and the condition n^^maxi<j<„ ||xj|p — )> 
ensure that n^^/^X^e is asymptotically normal. 

^ ds 

(a) If Aq = 0, then f3 has the same asymptotic distribution as the ordinary least squares 
estimator. If Aq > 0, then the limiting distribution of the Dantzig selector is not normal. 
(Hi) Corollary 2 should be compared with Theorem 2 of (Knight and Fu, 2000), which describes 

the limiting distribution of f3 . Though the limiting distribution of lasso is determined by 
an unconstrained optimization problem, the term ||u^*||i + sign(/3*)^,UA* in the limiting 
optimization problem for the Dantzig selector (5) also appears in the limiting optimization 
problem for lasso. 
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5. Discussion 

The results in this paper address fairly long-standing open questions about uniqueness for 
the Dantzig selector and lasso. To summarize, we prove that the Dantzig selector and lasso 
estimators are unique in almost all instances. Though these results may appear to be somewhat 
esoteric, Proposition 2 and its corollaries demonstrate their potential usefulness. Indeed, we 
have shown that once uniqueness is understood, it is straightforward to obtain the almost 
sure limit and limiting distribution of Dantzig selector estimators. Taking a broader view, 
the results presented here may help clear the path for a more operator theoretic approach 
to studying the Dantzig selector, lasso, and other regularized regression procedures. Such an 
approach may offer additional insights into properties of these methods in a variety of settings. 
For instance, one could potential obtain a better understanding of the Dantzig selector in as 
asymptotic regime where p — )■ oo, which is often of particular interest in regularized regression 
problems, by defining the Dantzig selector operator on an appropriate infinite dimensional 
space (analogous to the operator G defined in Section 4 above) and studying its continuity 
properties in this more abstract setting. Future research in this direction is needed. 

Appendix 

Proof of Proposition 1. The following two lemmas establishes the Karush-Kuhn- Tucker 
(KKT) conditions for the Dantzig selector and lasso optimization problems. The lemmas 
appear in various forms in several references, including (Efron et al., 2007), (Asif, 2008), 
(Dicker, 2010), and (Asif and Romberg, 2010), and proofs are omitted. 

Lemma Al. The vector /3 = /3 ^W is a solution to the Dantzig selector (2) if and only 
if there is (i eMP such that 

n-i||X^(y-X^)|U < A (6) 

n-i||X^XA||oo < 1 (7) 

n-'iJX^X^P = ll^lli (8) 

n-^A^A^d/ - ^/9) = AllAlli. (9) 



Lemma A2. The vector (3 = f3 eW is a solution to the lasso optimization problem (3) 
if and only if 



n-^X.J{y-X^) = Asign(/3,) z//3, ^0 
n-'Xj{y-X^)\ < A ^/4 = 0. 
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To prove 1 (a), we assume that [Mult] holds and show that the Dantzig selector has multiple 
solutions. Let /3 = f3^, fi = fi^, and A, B C {1, ...,p} be as in [Mult] and take u G M^ \ {0} so 
that u^ = and h'^X'^Xaua = 0, where A is the complement oi A in {1, ...,p}. Then it is 
clear from Lemma Al that ^ = /3° is a solution to the Dantzig selector. Furthermore, using 
Lemma Al, it is easy to check that /3 = /j" + tu is a solution to the Dantzig selector for t G M 
sufficiently small (take /t = n^). 

Now suppose that f3^, f3'^ G MJ' are distinct solutions to the Dantzig selector and let fJ-^, fJ-^ G 
MP be vectors such that /3 = /3* and fi = fi^, i = 1,2 satisfy (6)-(9). Without loss of 
generality, assume that s = sign((3^) = sign(/3^) and r = sign(^^) = sign(/i^), where we 
define sign(/3)j = sign(/3j) = f3j/\f3j\ or 0, according to f3j 7^ or /3j = 0, for /3 G W. Let 
A = {j; 13} ^0} = {j; /32 ^0},B = {j; fi] ^ 0} = {j; ^| ^ 0}. Then (7)-(8) imply that 
||n-^X'^XB^*||oo < 1 and u-^XJXb^^ = s G {il}'^', i = 1,2. Additionally, (9) implies that 
n-iX|XA/3^ = n~^X^XAl3^. Hence, dim [nn\\{n-^X]^XA)] > 0. It follows that n'^X^X is 
parallel to the £^-ball. 

Finally, to prove Proposition 1 (c), suppose 

(3\f3^e^Tgmm^\\y-Xf3\\^ + \\\(3\\i 



are distinct and suppose without loss of generality that s = sign(/3 ) = sign(/3 ). Let A = 
{j; /3] ^ 0} = {j; [3] ^ 0} and \i = (3"^ - f3\ Notice that for < t < 1 we have 

^{2tu^X^{y-Xl3')-t'u^X^Xu} = ^{Wy-X/BY 

-||y-X(/3^+tu)|p} 
= Ats^u. (10) 

Since (10) must hold for all < t < 1 and since A > 0, we must have Xu = XaUa = and 
S"^u = 0. It follows that 

dim [null (u-^X'^Xa)] > (11) 

and t= 11/3^111 = ||/32||i. 

Now, let w = A"^[(X^X)-X^y - f3^] G W, where {X^X)' is the Moore-Penrose pseu- 
doinverse of X^X. Then Lemma A2 implies that 

n~'\\X^Xw\\^ = ^\\X^{y - Xf3')\\^ < 1 

and n~^XjXw = (nA)^^Xj(y — Xf3^) G {±1}I^I. Proposition 1 (c) follows from these 
observations plus (11). B 



L. Dicker and X. Lin/Uniqueness and asymptotics for the Dantzig selector 13 

Proof of Proposition 2. To prove Proposition 2, we make use of the following lemma. 
Lemma A3. Suppose that n> p and that the rows of X are iid and drawn from a distribution 
which is continuous with respect to Lebesgue measure on MP. Let W be an n x q matrix of 
rank q < n. Then X^W has rank viim.{q,p} with probability 1. 

Proof of Lemma A3. Let X and W be as in the statement of the lemma. Without loss of 
generality, suppose that q = p. When p = I, the result is true. For p > I, let [p] = {1, ...,p}. 
To facilitate a proof by induction, assume that XT_-^-,W\j,-i] has rank p — 1 with probability 
1. On the event that X?^_-,^iW^[p_i] has rank p — 1, the rank of X^W is less than p if and only 
if 

Xj {/ - Py[,_i](Xj_yPy[,_i])-lXj_y} W, = 0, (12) 

where Xp = Xi^p^ and Wp = W{p}. Since W has full rank, it follows that 

with probability 1. Thus, conditioning on X[p_i] and using the fact that the conditional 
distribution of Xp is continuous, it follows that (12) holds with probability 0. We conclude 
that X^W has rank p with probability 1. D 

Getting back to the proof of Proposition 2, suppose that the rows of X are iid and drawn 
from a distribution which is continuous with respect to Lebesgue measure on W. Then X 
has rank min{n,p} with probability 1. Let A, i? C {l,...,p} and decompose A,B so that 
A = Aq U J, -B = i?o U J, and Aq, -Bqj and J are disjoint. If \A\ > n, then XJ^Xa has a 
non-trivial null space. Suppose for the moment that \A\ < n. When X has full rank, the 
dimension of the null space of XJ^Xa is non-zero if and only if 

dim (null K {I - XjiXjXj)-'Xj} X^,]) > 0. 

Furthermore, if X has full rank, then |/ — Xj(XjXj)^^Xj } Xaq has full rank. Conditioning 
on Xa and appealing to Lemma A3, it follows that the rank of A^^ |/ — Aj(AJAj)^^AJ} Xaq 
is min{|Ao|, |-Bo|} with probability 1. Thus the null-space of A^A^ is non-trivial with positive 
probability if and only if min{|i?|,n} < \A\. 

Now suppose that min{|i?|,n} < \A\. There are two cases: \B\ < \A\ < n and n < \A\. 
In each case, the probability that there exists w G M''^' such that AjA^w G {±1}'^' is 0. 
We prove this for the case \B\ < \A\ < n; the case n < \A\ follows similarly. Assume that 
\B\ < \A\ < n. Choose Ai C Aq such that \Ai\ = \Bq\ and let A = J U Ai. Suppose that 
AJA^w = s for some s G {±1}'"^' and w G M''^'. Then, assuming that A is full rank. 



and 



Wj = {XjXjr\sj-XjXB,WB,) 

Xl [{I - XjiXjXj)-'Xj} Xb,wb, + A,(AjA,)-^sj] = s^,. 
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Thus, we have 

WBo = [Xl {I - XjiXjXj)-'Xj} Xb,]-' {sa, - XlXjiXjXj)-%} , 

where Lemma A3 guarantees that Xj^ |/ — Xj{XjXj)~^Xj^ Xb^ is invertible with proba- 
bihty 1. Since, conditional on X^-^ub, the rows oi Xao\Ai are independent and have continuous 
distributions with respect to Lebesgue measure on M'^'^^^iI, it follows that 

xl^^^[{i-XjixJXj)-'xJ}XB,WB,+XjixJXj)-%]e{±iy''^\-\''^\ 

with probability 0. Thus, as claimed, the probability that there exists w G M'"^' such that 
X^Xbw e {±1}1^I is 0. 

The results from the last two paragraphs imply that 



dim {null(X^XA)} > and 
XjXfiW e {±1}I^I for some w G RI^I 







It follows that X'^X is parallel to the £^-ball with probability 0, as was to be shown. ■ 

Proof of Proposition 3. For n eN, let C„, C G ^q , v„, v G MF, and A„, A > and assume 

that Cn -^ C, \n ^ V, and A„ — )■ A. Let u„ = G{Cn, v„, A„) and let u = G{C, v. A). We show 

that u„ — 7- u. 

Since sup„ ||un|| < oo, there exists a subsequence {u„^}^-|^ and a vector Uq G MP such that 

u„j. — )■ Uq. To prove the proposition, it suffices to show that Uq = u. By continuity of the 

i°°-norm., we must have 

||Cuo- v||oo < A. 

Also, by the optimality properties of u„;, , we must have 

||uo||i= lim ||ii„J|i < liminf ||w„J|i (13) 

fc— >oo fc— >-oo 

for any sequence {w„j.}, with w„j. G MP and 

||CnfeW„^ - V„J|oo < A„^. (14) 

We consider two cases: A = and A > 0. First suppose A = and define w„^ = C~^^(Cii+v„j. — 
v). Then (14) holds and w„^ — )> u. From (13), it follows that ||uo||i < ||u||i and the optimality 
of u implies that u = uq. Now suppose that A > and define w^,, = (A„j,/A)C~^^(Cu — v) + 
C~^v„j.. Then (14) holds and, as in the previous case, we conclude that u = uq. Thus, in 
either case, u = uq, as was to be shown. ■ 
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Proof of Corollary 2. The conditions E{ef) < oo and n~^ maxi<j<„ ||xj|p — )■ ensure that 

n^'^/'^X'^e — > v° ~ N{0,a'^C), by the Lindeberg-Feller central hmit theorem. By the Sko- 
rokhod representation theorem, we may assume without loss of generality that n^^^'^X'^e —^ v*^ 
almost surely. 

Now let u = \fn{f3 — (3*) and notice that the Dantzig selector (2) is equivalent to the 
optimization problem 

minimize ||yn/3* + u||i , . 

subject to I |n-iX^Xu - n-^/^X^e\\^ < ^X^. ^^ 

^ ds 

In particular, u = ^/n{f3 — /3*) solves (15). We show that u — )> u", the solution to (5), almost 
surely. This suffices to prove the corollary. 

Since n~^X'^X — )■ C, n^^^'^X'^e — )■ v" almost surely, and y/n\n — )■ Aq, it follows that there 
is an almost surely finite random variable M such that | |u| |oo < M/2 whenever u is feasible for 
the optimization problem (15). Let s = sign(/3*) and notice that if y^min{|/3*|; j G A*} > M 
and u is feasible for (15), then sign{^/n|3* + u) = sign(Ms + u). It follows that 

G{n-^X^X, n-^/^X^e + n'^X'^XMs, v^A„) = Ms + u 

whenever y^min{|/3*|; j & A*} > M. Taking n — )■ oo. Proposition 3 implies that u — )> 

G{C, v^ + CMs, Aq) — Ms almost surely and it is straightforward to check that u" = G{C, v° + 
CMs, Ao) - Ms. ■ 
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